Language Models are Few-Shot Learners

GPT-3論文（ #GPT-3 ）

Figure 1.2

Larger models make increasingly efficient use of in-context information.

few-shotで性能改善。大規模なモデルほどfew-shotによる上がる幅が大きい

Figure 2.1

Instruction（Zero-Shot）

Demonstrationを入れる（Few-Shot）

In-Context Learning

モデルのパラメタを更新している訳ではないが、文脈から学習しているように見える

Few-Shotのサンプル数を増やすごとに性能が上がる（Figure 1.3など）

Table 2.2: Datasets used to train GPT-3

5000億トークン

（書籍500万冊に相当。1冊10万トークン）

Figure 3.1

validation lossの減少を確認

H Results on All Tasks for All Model Sizes

下流タスクでの評価

In-Context Learningで評価している

Zero-Shot

One-Shot

Few-Shot

（汎用的なLLMができてほしいから納得感）